{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.11 残差网络（ResNet）\n",
    "\n",
    "让我们先思考一个问题：对神经网络模型添加新的层，充分训练后的模型是否只可能更有效地降低训练误差？理论上，原模型解的空间只是新模型解的空间的子空间。也就是说，如果我们能将新添加的层训练成恒等映射$f(x) = x$，新模型和原模型将同样有效。由于新模型可能得出更优的解来拟合训练数据集，因此添加层似乎更容易降低训练误差。然而在实践中，添加过多的层后训练误差往往不降反升。即使利用批量归一化带来的数值稳定性使训练深层模型更加容易，该问题仍然存在。针对这一问题，何恺明等人提出了残差网络（ResNet） [1]。它在2015年的ImageNet图像识别挑战赛夺魁，并深刻影响了后来的深度神经网络的设计。\n",
    "\n",
    "\n",
    "## 5.11.1残差块\n",
    "\n",
    "让我们聚焦于神经网络局部。如图5.9所示，设输入为$\\boldsymbol{x}$。假设我们希望学出的理想映射为$f(\\boldsymbol{x})$，从而作为图5.9上方激活函数的输入。左图虚线框中的部分需要直接拟合出该映射$f(\\boldsymbol{x})$，而右图虚线框中的部分则需要拟合出有关恒等映射的残差映射$f(\\boldsymbol{x})-\\boldsymbol{x}$。残差映射在实际中往往更容易优化。以本节开头提到的恒等映射作为我们希望学出的理想映射$f(\\boldsymbol{x})$。我们只需将图5.9中右图虚线框内上方的加权运算（如仿射）的权重和偏差参数学成0，那么$f(\\boldsymbol{x})$即为恒等映射。实际中，当理想映射$f(\\boldsymbol{x})$极接近于恒等映射时，残差映射也易于捕捉恒等映射的细微波动。图5.9右图也是ResNet的基础块，即残差块（residual block）。在残差块中，输入可通过跨层的数据线路更快地向前传播。\n",
    "\n",
    "![设输入为$\\boldsymbol{x}$。假设图中最上方激活函数输入的理想映射为$f(\\boldsymbol{x})$。左图虚线框中的部分需要直接拟合出该映射$f(\\boldsymbol{x})$，而右图虚线框中的部分需要拟合出有关恒等映射的残差映射$f(\\boldsymbol{x})-\\boldsymbol{x}$](../img/residual-block.svg)\n",
    "\n",
    "ResNet沿用了VGG全$3\\times 3$卷积层的设计。残差块里首先有2个有相同输出通道数的$3\\times 3$卷积层。每个卷积层后接一个批量归一化层和ReLU激活函数。然后我们将输入跳过这两个卷积运算后直接加在最后的ReLU激活函数前。这样的设计要求两个卷积层的输出与输入形状一样，从而可以相加。如果想改变通道数，就需要引入一个额外的$1\\times 1$卷积层来将输入变换成需要的形状后再做相加运算。\n",
    "\n",
    "残差块的实现如下。它可以设定输出通道数、是否使用额外的$1\\times 1$卷积层来修改通道数以及卷积层的步幅。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "from tensorflow.keras import layers,activations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Residual(tf.keras.Model):\n",
    "    def __init__(self, num_channels, use_1x1conv=False, strides=1, **kwargs):\n",
    "        super(Residual, self).__init__(**kwargs)\n",
    "        self.conv1 = layers.Conv2D(num_channels,\n",
    "                                   padding='same',\n",
    "                                   kernel_size=3,\n",
    "                                   strides=strides)\n",
    "        self.conv2 = layers.Conv2D(num_channels, kernel_size=3,padding='same')\n",
    "        if use_1x1conv:\n",
    "            self.conv3 = layers.Conv2D(num_channels,\n",
    "                                       kernel_size=1,\n",
    "                                       strides=strides)\n",
    "        else:\n",
    "            self.conv3 = None\n",
    "        self.bn1 = layers.BatchNormalization()\n",
    "        self.bn2 = layers.BatchNormalization()\n",
    "\n",
    "    def call(self, X):\n",
    "        Y = activations.relu(self.bn1(self.conv1(X)))\n",
    "        Y = self.bn2(self.conv2(Y))\n",
    "        if self.conv3:\n",
    "            X = self.conv3(X)\n",
    "        return activations.relu(Y + X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面我们来查看输入和输出形状一致的情况。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TensorShape([4, 6, 6, 3])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blk = Residual(3)\n",
    "#tensorflow input shpe     (n_images, x_shape, y_shape, channels).\n",
    "#mxnet.gluon.nn.conv_layers    (batch_size, in_channels, height, width) \n",
    "X = tf.random.uniform((4, 6, 6 , 3))\n",
    "blk(X).shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们也可以在增加输出通道数的同时减半输出的高和宽。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TensorShape([4, 3, 3, 6])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "blk = Residual(6, use_1x1conv=True, strides=2)\n",
    "blk(X).shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.11.2 ResNet模型\n",
    "\n",
    "ResNet的前两层跟之前介绍的GoogLeNet中的一样：在输出通道数为64、步幅为2的$7\\times 7$卷积层后接步幅为2的$3\\times 3$的最大池化层。不同之处在于ResNet每个卷积层后增加的批量归一化层。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "net = tf.keras.models.Sequential(\n",
    "    [layers.Conv2D(64, kernel_size=7, strides=2, padding='same'),\n",
    "    layers.BatchNormalization(), layers.Activation('relu'),\n",
    "    layers.MaxPool2D(pool_size=3, strides=2, padding='same')])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个模块的通道数同输入通道数一致。由于之前已经使用了步幅为2的最大池化层，所以无须减小高和宽。之后的每个模块在第一个残差块里将上一个模块的通道数翻倍，并将高和宽减半。\n",
    "\n",
    "下面我们来实现这个模块。注意，这里对第一个模块做了特别处理。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ResnetBlock(tf.keras.layers.Layer):\n",
    "    def __init__(self,num_channels, num_residuals, first_block=False,**kwargs):\n",
    "        super(ResnetBlock, self).__init__(**kwargs)\n",
    "        self.listLayers=[]\n",
    "        for i in range(num_residuals):\n",
    "            if i == 0 and not first_block:\n",
    "                self.listLayers.append(Residual(num_channels, use_1x1conv=True, strides=2))\n",
    "            else:\n",
    "                self.listLayers.append(Residual(num_channels))      \n",
    "    \n",
    "    def call(self, X):\n",
    "        for layer in self.listLayers.layers:\n",
    "            X = layer(X)\n",
    "        return X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接着我们为ResNet加入所有残差块。这里每个模块使用两个残差块。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ResNet(tf.keras.Model):\n",
    "    def __init__(self,num_blocks,**kwargs):\n",
    "        super(ResNet, self).__init__(**kwargs)\n",
    "        self.conv=layers.Conv2D(64, kernel_size=7, strides=2, padding='same')\n",
    "        self.bn=layers.BatchNormalization()\n",
    "        self.relu=layers.Activation('relu')\n",
    "        self.mp=layers.MaxPool2D(pool_size=3, strides=2, padding='same')\n",
    "        self.resnet_block1=ResnetBlock(64,num_blocks[0], first_block=True)\n",
    "        self.resnet_block2=ResnetBlock(128,num_blocks[1])\n",
    "        self.resnet_block3=ResnetBlock(256,num_blocks[2])\n",
    "        self.resnet_block4=ResnetBlock(512,num_blocks[3])\n",
    "        self.gap=layers.GlobalAvgPool2D()\n",
    "        self.fc=layers.Dense(units=10,activation=tf.keras.activations.softmax)\n",
    "\n",
    "    def call(self, x):\n",
    "        x=self.conv(x)\n",
    "        x=self.bn(x)\n",
    "        x=self.relu(x)\n",
    "        x=self.mp(x)\n",
    "        x=self.resnet_block1(x)\n",
    "        x=self.resnet_block2(x)\n",
    "        x=self.resnet_block3(x)\n",
    "        x=self.resnet_block4(x)\n",
    "        x=self.gap(x)\n",
    "        x=self.fc(x)\n",
    "        return x\n",
    "    \n",
    "mynet=ResNet([2,2,2,2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后，与GoogLeNet一样，加入全局平均池化层后接上全连接层输出。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里每个模块里有4个卷积层（不计算 1×1卷积层），加上最开始的卷积层和最后的全连接层，共计18层。这个模型通常也被称为ResNet-18。通过配置不同的通道数和模块里的残差块数可以得到不同的ResNet模型，例如更深的含152层的ResNet-152。虽然ResNet的主体架构跟GoogLeNet的类似，但ResNet结构更简单，修改也更方便。这些因素都导致了ResNet迅速被广泛使用。\n",
    "在训练ResNet之前，我们来观察一下输入形状在ResNet不同模块之间的变化。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "conv2d_6 output shape:\t (1, 112, 112, 64)\n",
      "batch_normalization_5 output shape:\t (1, 112, 112, 64)\n",
      "activation_1 output shape:\t (1, 112, 112, 64)\n",
      "max_pooling2d_1 output shape:\t (1, 56, 56, 64)\n",
      "resnet_block output shape:\t (1, 56, 56, 64)\n",
      "resnet_block_1 output shape:\t (1, 28, 28, 128)\n",
      "resnet_block_2 output shape:\t (1, 14, 14, 256)\n",
      "resnet_block_3 output shape:\t (1, 7, 7, 512)\n",
      "global_average_pooling2d output shape:\t (1, 512)\n",
      "dense output shape:\t (1, 10)\n"
     ]
    }
   ],
   "source": [
    "X = tf.random.uniform(shape=(1,  224, 224 , 1))\n",
    "for layer in mynet.layers:\n",
    "    X = layer(X)\n",
    "    print(layer.name, 'output shape:\\t', X.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.11.3  获取数据和训练模型\n",
    "\n",
    "下面我们在Fashion-MNIST数据集上训练ResNet。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 48000 samples, validate on 12000 samples\n",
      "Epoch 1/5\n",
      "48000/48000 [==============================] - 1687s 35ms/sample - loss: 0.4712 - accuracy: 0.8334 - val_loss: 0.3751 - val_accuracy: 0.8681\n",
      "Epoch 2/5\n",
      "48000/48000 [==============================] - 1729s 36ms/sample - loss: 0.3195 - accuracy: 0.8837 - val_loss: 0.2905 - val_accuracy: 0.8945\n",
      "Epoch 3/5\n",
      "48000/48000 [==============================] - 1910s 40ms/sample - loss: 0.2796 - accuracy: 0.8982 - val_loss: 0.3907 - val_accuracy: 0.8698\n",
      "Epoch 4/5\n",
      "48000/48000 [==============================] - 1952s 41ms/sample - loss: 0.2563 - accuracy: 0.9062 - val_loss: 0.3572 - val_accuracy: 0.8705\n",
      "Epoch 5/5\n",
      "48000/48000 [==============================] - 1940s 40ms/sample - loss: 0.2407 - accuracy: 0.9119 - val_loss: 0.2996 - val_accuracy: 0.8917\n",
      "10000/1 - 14s - loss: 0.3452 - accuracy: 0.8917\n"
     ]
    }
   ],
   "source": [
    "(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()\n",
    "x_train = x_train.reshape((60000, 28, 28, 1)).astype('float32') / 255\n",
    "x_test = x_test.reshape((10000, 28, 28, 1)).astype('float32') / 255\n",
    "\n",
    "mynet.compile(loss='sparse_categorical_crossentropy',\n",
    "              optimizer=tf.keras.optimizers.Adam(),\n",
    "              metrics=['accuracy'])\n",
    "\n",
    "history = mynet.fit(x_train, y_train,\n",
    "                    batch_size=64,\n",
    "                    epochs=5,\n",
    "                    validation_split=0.2)\n",
    "test_scores = mynet.evaluate(x_test, y_test, verbose=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "* 残差块通过跨层的数据通道从而能够训练出有效的深度神经网络。\n",
    "* ResNet深刻影响了后来的深度神经网络的设计。\n",
    "\n",
    "\n",
    "## 练习\n",
    "\n",
    "* 参考ResNet论文的表1来实现不同版本的ResNet [1]。\n",
    "* 对于比较深的网络， ResNet论文中介绍了一个“瓶颈”架构来降低模型复杂度。尝试实现它 [1]。\n",
    "* 在ResNet的后续版本里，作者将残差块里的“卷积、批量归一化和激活”结构改成了“批量归一化、激活和卷积”，实现这个改进（[2]，图1）。\n",
    "\n",
    "\n",
    "\n",
    "## 参考文献\n",
    "\n",
    "[1] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).\n",
    "\n",
    "[2] He, K., Zhang, X., Ren, S., & Sun, J. (2016, October). Identity mappings in deep residual networks. In European Conference on Computer Vision (pp. 630-645). Springer, Cham.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": true,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
